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Abstract 

In this note we introduce the notion of islands for restricting local search. We show how we 
can construct islands for CNF SAT problems, and how much search space can be eliminated by 
restricting search to the island. 

1 Background and Definitions 

In the following subsections, we give the necessary definitions and notations for subsequent discussion 
and presentation. 

1.1 SAT 

A (propositional) variable can take the value of either (false) or 1 (true). A literal is either a 
variable x or its complement x. A literal I is true if I assumes the value 1; I is false otherwise. A 
clause is a disjunction of literals, which is true when one of its literal is true. A Satisfiability (SAT) 
problem consists of a finite set of variables and a finite set of clauses (treated as conjunction). 

A SAT problem is a special case of a CSP (Z, D,C): Z is the set of variables of the SAT problem, 
the domain of each variable is {0, 1}, and C contains all the clauses, each of which is considered a 
constraint in C restricting the values that the variables can take. 

Given a CSP P = (Z, D, C). We use var(c) to denote the set of variables that occur in constraint 
c G C. A valuation for variable set {x\, . . . , x n } C Z is a mapping from variables to values denoted 
{x\ i— > ai, . . . , x n i— > a n } where each x, is a variable and a, G D Xi . 

A state of P (or C) is a valuation for Z . The projection 7r(s, v) of a valuation s on variable set 
v' onto a set of variables v C v' is defined as 

7r(s, v) = {x i— > a | (x i— ► a G s) A (x G w)}. 

A state s is a solution of a constraint c if 7r(s, var(c)) is a set of variable assignments which makes c 
true. A state s is a solution of a CSP (Z, D, C) if s is a solution to all constraints in C simultaneously. 
In the context of SAT problems, a solution makes all clauses true simultaneously. 

Since we are dealing with SAT problems we will also use an alternate representation of a state 
as a set of literals. A state {x\ i— > a\,...,x n t— > a n } corresponds to a set of literals {£j | a, = 
1} U {xj | Oj- = 0}. 
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Unless stated otherwise, we understand constraints (or clauses) in a set as always conjunctcd. 
Therefore, we abuse terminology by using the phrases "a conjunction of constraints (or clauses)" 
and "a set of constraints (or clauses)" interchangeably. 

1.2 Local Search 

A local search solver moves from one state to another using a local move. We define the neighbour- 
hood n(s) of a state s to be all the states that are reachable in a single move from state s. The 
neighbourhood states are meant to represent all the states reachable in one move, independent of 
the actual heuristic function used to choose which state is moved to. 

For the purpose of this paper, we assume the neighbourhood function n(s) returns the states 
which arc at a Hamming distance of 1 from the starting state s. The Hamming distance between 
states si and s 2 is defined as 

dh(si, s 2 ) = \si - (si n s 2 )| = \s 2 - (si H s 2 )|. 

In other words, the Hamming distance measures the number of differences in variable assignment 
of si and s 2 . This neighbourhood reflects the usual kind of local move in SAT solvers, flipping one 
variable. 

A local move from state s is a transition, s =>■ s', from s to s' G n(s). A local search procedure 
consists of at least the following components: 

• a neighbourhood function n for all states; 

• a heuristic function b that determines the "best" possible local move s s' for the current 
state s; and 

• possibly an optional "breakout" procedure to help escape from local minima. 

We note that the notion of noises as appeared in some solvers, such as WalkSAT, can be incorporated 
into the heuristic function b. We also decouple the notion of neighbourhood from the heuristic 
function since they are orthogonal to each other, although they are mixed together in the description 
of a local move in GSAT, WalkSAT, and others. 

2 Island Constraints 

We introduce the notion of island constraints, the solution space of which is connected in the following 
sense. Central to a local search algorithm is the definition of the neighbourhood of a state since each 
local move can only be made to a state in the neighbourhood of the current state. We say that a 
constraint is an island constraint if we can move from any state in the constraint's solution space to 
another using a sequence of local moves without moving out of the solution space. 

Let sol(c) denote the set of all solutions to a constraint c, in other words the solution space of c. 
A constraint c is an island constraint (or simply island) if, for any two states so,s„ G sol(c), there 
exist states si, . . . , s„_i € sol(c) such that Sj => Sj+i for alH G {0, . . . , n — 1}. A constraint c with 
\sol(c)\ < 1 is thus an island by definition. We call such islands trivial. 

Immediately questions about islands arise: 

• When is a constraint an island? 

• Given n islands c\, . . . , c„ of different constraint types. When is the conjunction c\ A • • • A c„ 
an island, if at all? 



2 



Before embarking on answering these questions, without loss of generality, we assume from now 
on that all clauses are in standard form: (1) no literals occur more than once in the same clause, and 
(2) no literal and its complement occur together in the same clause. This standard form requirement 
is easy to fulfill since we observe that 

■•• V/ V ••• Vl V ••• = ••• W V ••• 

and 

• • • V I V • • • V IV ■ ■ ■ = true 

for any literal 

Theorem 1 Any clause c forms an island. 

Proof: Consider two solutions sq and s n of c. Then (treating them as sets of literals) sq n c ^ 
and s„ n c 7^ 0. Choose l n G s n n c. Clearly si = sq — {l n } U {l n } is also a solution of c, and either 
equals so or is a neighbour. Now move from si s n be flipping any variable different from that in 
/„. Clearly each state in this sequence is a solution becuase is contains l n . □ 



3 Non-Conflicting Clause Set 

We give a first sufficient condition for when a set C of clauses results in an island. We note that any 
solution to a clause must contain at least one assignment of the form l/l. The idea is to disallow 
the simultaneous occurrences of I and / in C. The intuition of this restriction is as follows. Suppose 
literal I occurs in clause Cj and I occurs in Cj. Suppose I is 0. During the course of the local moves, it 
might be necessary to set I to 1. However, if / is the only literal in Cj assuming the value 1, resetting 
I falsifies cj, moving the trajectory out of sol(C). 

Let lit(c) denote the set of all literals of a clause c. A set C of clauses is non- conflicting if there 
does not exist a variable x such that x, x S [J{lit(c) \ c G C}. 

Theorem 2 A non- conflicting set C of clauses forms an island. 

Proof: Consequence of Theorem OH proved in the following section. □ 



4 Primal Non-Conflicting Clause Set 

The requirement of the non-conflicting property on all variables is too stringent. It suffices to impose 
this restriction on only a subset of variables, in particular, only one variable from each clause. 

Without loss of generality, we impose an arbitrary total ordering < on the variables in a SAT 
problem. With such a total ordering, it makes sense to talk about the least variable among a set of 
variables. We say that I is the <-primal literal, denoted by p<(c), of a clause c if var{l) is the least 
among all variables in var{c) using the < ordering. 

Given a set of clauses C and a variable ordering <. The <-primal literal set of C, pLit < (C) , is 
the set of all <-primal literals of the clauses in C . In other words, 

pLit < {C) = {p < {c)\c&C}. 

C is <-primal non- conflicting if there does not exist a variable x such that x,x £ pLit < (C). 
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Lemma 1 Given a <-primal non- conflicting set C of clauses with variable ordering < any state 
s 3 pLit<(C) is a solution of C. 

Proof: Since every clause in C contains a literal from pLit < (C) , the variable assignments in s make 
at least one literal in each clause true. □ 

Lemma Ogives a method to find a solution of C. This solution consists of any assignments that 
makes the literals in pLit < (C) true. The assignments for variables not in pLit < (C) can be arbitrary. 
For example, if C has variables {x\, . . . , ^5} and pLit<(C) — {x~2, X4, 3J5}, then 

{xx/1, x 2 /0, X3/1, 354/1, £5/0} 

is a solution of C. Note that the assignments for variables x\ and X3 can be arbitrary since they are 
not in pLit^C). 

Theorem 3 A <-primal non- conflicting set C of clauses forms an island. 

Proof: Given any solutions s of C we construct a path of moves (remaining as solutions of C) 
from s to s where s 'D pLit < (C). Clearly we can move from any solution s to another s' where 
s' D pLit < (C) simply by modifying literals not in pLit < (C). Hence we have a path from any 
solution to any other. 

Suppose pLit^C) % s. There must exist a least variable x such that the either x £ s and x £ 
pLit < (C) or x £ s and x £ pLit < (C). Let I be the literal in s containing x. Define s' = s — {/} U {/}. 
Consider each clause c £ C, we show that s' is a solution of each c. 

• p<(c) = I: Clearly s' is a solution of c. 

• P<( c ) = ' : Contradiction since / £ pLit < (C) and C is <-primal non-conflicting. Hence this 
case cannot occur. 

• p<{c) involves variable x' < x: By the choice of x, we have that p<(c) £ s and hence also in 
s' . Thus s' is a solution of c. 

• p<(c) involves variable x' > x: Clearly the variable x does not occur in c (otherwise it would 
give the primal literal). Since the only difference between s' and s is on x, clearly s' remains 
a solution of c. 

Since the number of literals in s' n pLit < (C) is one more than in s D pLit < (C), this process 
eventually terminates in a solution s D pLit < (C). □ 

Note that that the total ordering on variables is entirely arbitrary. It gives us a consistent way of 
picking a primal literal for each clause c, and thus moving from any solution to any other, through 
the primal literal set. 

A direct consequence of Theorem [3] is its converse, stated as follows. 

Corollary 1 If a set C of clauses is satisfiable but not an island, then there exists no ordering < 
such that C is <-primal non- conflicting. 

Consider an island C formed from a set of constraints. If every subset of C is also an island, we 
say that C is compositional. 

Proposition 1 Given any total ordering < on variables. Islands formed from <-primal non- 
conflicting sets of clauses are compositional. 
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procedure island Extr(C:in,L:out,Q:out) 
begin 

£<-D; 

Q^0; 

while C i= do 

pick the "best" literal / in C; 
L <- L++[l}; 

Q <— Q U {all clauses in C containing only £}; 
C <— C — {all clauses in C containing either I or I}; 
end while 
end 

Figure 1: The island Extr greedy algorithm 

Proof: Suppose the set C of clauses is <-primal non-conflicting. We observe that every subset of 
C is also <-primal non-conflicting. Therefore, every subset of C is an island. □ 

We shall see later that compositionality is important for the dynamic version of the Island 
Confinement Method. The converse of Proposition ^ does not hold. Consider the simple island 

C = [xx V X2 V 0:3) A (xi V X2 V £3) 

which is compositional since any individual clause forms an island. We can also easily verify that 
there exists no ordering < that makes C <-primal non-conflicting. It is because the two clauses 
Ci and C2 in C are "mirror images" of each other in the sense that for every literal / in c%, I is in 
C2, and vice versa. Thus, no matter what the ordering < is, we would have both I and I in the 
<-primal literal set. This means that the <-primal non-conflicting property is only a sufficient but 
not a necessary condition for compositional islands or even just island. The search for a more exact 
characterization of islands continues. 

On the other hand, we show in the next two sections that Oprimal non-conflicting sets cover a 
large class, although not all, of islands, and are useful in practice. Given a SAT problem C. We give 
a greedy algorithm to compute a <-primal non-conflicting subset of C. Our results show that this 
subset covers over 80% of the clauses on average using 11 benchmarks from the DIMACS archive. 

5 A Greedy Algorithm 

FigureQlgives a simple greedy algorithm, islandExtr, for extracting a <-primal non-conflicting subset 
of clauses from an arbitrary set of clauses. The input to the algorithm is a set of clauses, and the 
output is a <-primal non-conflicting set Q C C of clauses plus the the <-primal literal set L (stored 
as a list) of Q. The ordering of the literals in the list L induces a variable ordering <, which is divided 
into two parts. The ordering of the variables in L follows the same ordering of their corresponding 
literals in L. The ordering among variables not in L can be arbitrary but they must all be greater 
than variables in L. It should be noted that L, which is essentially a sequenced version of pLit < (Q), 
gives also a solution to the output island Q using Lemma ^ 

The islandExtr algorithm works as follows. Initially L and Q are empty, ready to accumulate 
results to be collected. While there are still clauses from C, the algorithm tries to find the "best" 
literal I from C. We defer our discussion of the notion of "best" to the next paragraph, in order 
not to break the flow of the description of the algorithm. This "best" literal will be the <-primc 
literal in all clauses containing / in C, which will be added to Q to become part of the <-primal 
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\C\ 


101 


\var(C)\ 


\n(L)\ 


aim_100_l_6 


160 


150 (93.8%) 


100 


38 (38%) 


Iianoi4 


4934 


4065 (82.4%) 


718 


197 (27.4%) 


f600 


2550 


2134 (83.7%) 


600 


183 (30.5%) 


f2000 


8500 


7072 (83.2%) 


2000 


624 (31.2%) 



Table 1: Greedy Algorithm on Hard DIMACS Problems 



non-conflicting set that we are computing. That is why I is appended to L. The ++ operator stands 
for list concentenation. Now clauses containing I can be removed from C since they are already in 
Q. Clauses containing I must also be removed since I is the prime literal of these clauses, which can 
never qualify to be added to Q. This process is repeated until C becomes empty. 

The objective of the islandExtr algorithm is to collect as many clauses from C as possible for Q, 
which is determined directly by the choice of I in each step of the loop. We encode greedy heuristics 
in the selection of the "best" literal. One naive approach is to select the literal I that occurs in the 
most number of clauses in C. What could go wrong, however, is that a large number of clauses 
containing I might also be removed as a result of this selection. Therefore, the greedy heuristic 
should strike a careful balance between the number of clauses containing I and those containing 
I. The idea is that the benefit gained from selecting I should outgrow the penalty for removing 
clauses containing I. Some possibilities are to choose the literal I with the maximum of the following 
expressions: 

• -#(0, 

• #(*)-#(*), 

. #(0/#(0, and 

• #(0/(#(0 + #(0), 

where if (I) denotes the number of clauses containing I as a literal in C. Note that the second and 
the third expressions are equivalent since 

#Gi)#a 2 ) + #{h)#{h) > #{h)#{h) + #a 2 )#(fi) 

implies 

#{h)#{h) > #(i 2 )#(fi). 

Different expressions above give a different metric to measure the "efficiency" of I over I as compared 
to other literals in C. More complex heuristics can be devised, but we should bear in mind that 
greedy algorithms are supposed to be simple and efficient. 

TableHgives the result of applying the islandExtr algorithm to four hard problems in the DIMACS 
archive. The expression "#(1) /#(')" i s used to select the best variable. These are large problems 
containing 100 to 2000 variables. The first column contains the problem names. The second column 
gives the number of clauses. The third gives the number of clauses of the extracted island and its 
associated percentage. The fourth column gives the total number of variables. The last column, 
denoted by |n(L)|, gives the size of the neighbourhood of the initial solution (obtained from L using 
Lemma^) restricted to only states on the islands. For example, each state in "aim_100_l_6" (which 
has 100 variables) has 100 neighbouring states. If we restrict our attention to only states in the 
island extracted, the initial solution has only 38 neighbouring states. 

To further demonstrate the benefits of identifying islands in a SAT problem, we performed the 
same experiment on a set of small problems, also from the DIMACS archive. Each of these problems 
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1535 
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75 (94.9%) 
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2521 


23 
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Table 2: Greedy Algorithm on Easy DIMACS Problems 



contains 20 variables and 91 clauses. Therefore, the size of the entire search space of each problem 
is 2 20 = 1, 048, 5 76 in terms of the number of states. We choose small problems so that we can use 
a complete search algorithm to find the size of the search space of the extracted islands and the 
number of solutions, which are reported in the third and fourth columns of Table [3 The number 
and percentage of clauses of the extracted islands are reported in the second column of the table. 

Of the eleven benchmarks that we tried, the islands contain on average over 80% of the total 
number of clauses of the corresponding problems. Experiments on the smaller problems also demon- 
strate an actual reduction of three orders of magnitude in the search space of the islands over that 
of the original problems. Of course the question remains whether the smaller search space actually 
helps the local search algorithm. 
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